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Summary. This is a collection of three written discussions of the paper “sequential quasi-Monte Carlo sampling" 
by M. Gerber and N. Chopin, following the presentation given before the Royal Statistical Society in London on 
December 10th, 2014. 


1. Consequences on Bayes nonparametrics (J. Arbel and I. Priinster) 


We congratulate the authors for a stimulating paper, accessible to laypersons, and thank them for sharing their 
code. We would like to comment on the potential implication of SQMC in Bayesian nonparametrics. 

Nonparametric mixtures are commonly used for density estimation and can be thought of as an extension of 
finite mixture models when the number of clusters is unknown. The setting is as follows: observations yi:t are 
spread out into kt clusters; the cluster labels, or allocation variables, are denoted by Xi:t, and are interpreted 
as the states of yi-t in the context of SMC. Note that the states are discrete and elements of {1,..., kt}, where 
kt denotes the number of clusters. The ntj observations allocated to the same cluster j are iid from a given 
parametric kernel K{--,9j). This model is static in the sense that data are usually not recorded sequentially. 


However, Polya urn type schemes (IB lack well and MacQueen 

19731 essentially formulate this model sequentially 

by artificially thinking of observation yt to occur at time t. So the nonparametric mixture model can be cast 

as an SMC sampler as follows (see Liu 

1996 

Fearnhead[ 20041 


yt|xt,6» ~ K{yt-,e^J, 

Xt|xi:t_l -- /^(Xt|xi:t_l). 


The predictive distribution is particularly simple for the Dirichlet process (Ferguson 1973 Blackwell and 


MacQueen 19731. Denote by a and Go the precision parameter and the base measure of the Dirichlet process. 


Then the kernel for the states is given by the following probability mass function on {1,..., kt-i + 1} 


TOt(xi,t_i,Xt = j) = ptj oc 


nt-ijKj{yt\xi,t-i) 

aKoiyt) 


if j < kt-i 
if j = kt-i + 1 


( 1 ) 


where Kj and Kq involve integrals of K and Go which can be calculated in conjugate cases and approximated 
otherwise. Note that TOt(xi.t_i, x^) is also available in closed-form expression for broader classes of random prob- 


ability measures including the two-parameter Poisson-Dirichlet process 

Pitman and Yor 

19971 and normalised 

random measures with independent increments ( 

Regazzini et al. 

2003 

James et al. 20091. 


As stressed by Gerber and Chopin, the key ingredient of SQMC is the deterministic transform T*. A possible 
choice for (Q is 


rt(xi:t_i,Ut) = min|j € {1 ,... ,kt-i + 1} : > wj (2) 

i=l 

where U( ~ G([0,1)). The case of discrete proposal mt{xi-.t-i,^t) does not seem to be discussed in the paper. 
Would the authors expect a gain in efficiency by considering a quasi-Monte Carlo alternative as (i) instead of 
standard SMC? 
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2. Spectrum of the sqMC method (C. Robert) 


Quasi-Monte Carlo methods are ignored by the statistical community, despite regular attempts by well-respected 
researchers to induce a broader use of those methods. The authors are to be congratulated and will hopefully 
contribute to a wider diffusion of qMC methods. 

At a purely mathematical level, that randomised low discrepancy sequences produce both unbiased estimators 
and error rates of order log(A^)‘^“^) at cost 0(A^log(fV)) implies that sqMC methods should always be 

replace regular Monte Carlo methods. Since this is not the case, we may wonder at the reasons. The major 
difficulty with qMC stands in its requirement to expressing Monte Carlo estimators as deterministic transforms 
of a fixed number of uniforms. This is a strong deterrent to the dissemination of qMC alternatives, as moving to 
a random number of baseline uniforms does not seem achievable in full generality. It is thus no surprise that the 
proposal appears in connection with particle filters. Indeed, this field centres on dynamic importance sampling 
and hence enjoys a local iid-ness that relates better to qMC integrators than single-chain MCMC algorithms. 
For instance, each particular resampling step consists in multinomial simulation, hence could be turned into 


qMC. Actually, lower variance solutions that amount to qMC, like systematic and residual sampling (Gundersen 


and Jensen 1987 Fearnhead 1998, Chopin 20041, have been proposed in the particle literature. 


Here, the authors apply qMC to the particles themselves, using a global low discrepancy sequence for the 
resampling and the move steps: as the cost of using the Hilbert curve grows quickly with dimension, two 
sequences in lieu of one would bring a significant cost reduction. Moreover, they still assume the deterministic 
transform representation 

= rt(x^_i,0, 

which is a stumbling block for those contemplating switching to qMC, as illustrated in the paper using only 
Normal baseline distributions. In a sequential setting with unknown parameters 0, the transform F changes 
each time 9 is modified, which impacts computing cost when the inverse cdf is not available. Since simulating 9 
is unlikely to benefit from qMC improvements, one may question the relevance of the overall scheme: the most 
variable item in the Gibbs sampling steps expands its inefficiency to the joint kernel. 

In connection with Lemma 1 and the sqMC approximation of the evidence, Rao-Blackwellisation using all 
proposed moves could be considered: is this easier and significant within qMC since the gain may be of a lower 
order? Similarly, the trick at the end of §4.2.1, using sqMC on a single instead of {t + 1) vectors, is unclear, but 
I wonder at a connection with the particle learning literature (Lopes et al. 2010[ Carvalho et al. 20101. 

In conclusion, I am looking forward the aftermath of this Read Paper that will expose whether qMC is 
bound to become the reference in simulation computational statistics or to remain a niche activity away from 
mainstream simulation. 


3. ABC with sqMC (R. Ryder) 


One of the reasons why the authors are so successful in their new methodology is the ability to know in advance 
the number of uniform variables which are needed for one step of the algorithm. This allows greater efficiency 
and, to use the terminology at the end of section 1.2 of the paper, the use of QMC point sets (fixed length) 
instead of QMC sequences (unbounded length). Nonetheless, as noted by the authors, the use of QMC sequences 
is at least theoretically possible. 

When reading the paper, I was immediately curious about the possibility of adapting this methodology in an 
ABC (Approximate Bayesian Computation) setting. In an attempt to develop a naive ABC population Quasi 
Monte Carlo algorithm, it is natural to use QMC sequences instead of QMC point sets, as shown in Algorithm 
which is adapted from Marin et al. ( |2012[ |. 

I tried this algorithm on a toy example: yk ~ N{xk,a'^) where (x^) is a Markov chains taking values 
in {—2,2}, with probability of switching at any iteration equal to 9, and attempted to estimate 9. For the 
generation of QMC sequences, I used the raindtoolbox R package Dutang and Savicky (20131. 

There are two points to note from this simple experiment: 


(a) The QMC version systematically outperforms the basic population ABC sampler, with a variance typically 
about 30 times smaller for an equal number of particles. 

(b) The computational time of the QMC version explodes in a non-linear fashion when the number of particles 
increases, as shown in Figure This is intriguing, and might be due to the cost of generating a QMC 
sequence, at least in the implementation I used. 


There is certainly much more work to be done to adapt these new ideas to an ABC setting. In particular, 
there is one situation where QMC point sets could certainly be useful: in ABC, one typically accepts a proposed 
value 9* it that value leads to simulated data z which is close to the observed data y, in the sense that d(y, z) < e 
for some pseudometric d and threshold e. The threshold e is note always chosen in advance, but is sometimes a 
quantile from a fixed number of attempts: there must surely be situation where QMC could be used to improve 
the estimator for such flavours of ABC. I was surprised not to find any investigation of this idea in the literature. 
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Algorithm 1 Likelihood-free population Quasi Monte Carlo sampler 
Input: observations y, decreasing sequence (ei, ..., e^v), prior tt with cdf 
At iteration t = 1, 

Set j = 1 
for z = 1 to A do 
repeat 

Generate Uj as the jth element in an 1 dimensional RQMC sequence. 

Set = F~^{uj) and z ~ /(z | 

Increment j. 
until p(?7(z),?7(y)) < ei 
Set = 1/A 
end for 

Take Si as twice the empirical variance of the 0|^^’s 
for t = 2 to T do 
Set J = 1 

for z = 1 to A do 

repeat 

Generate {uj,Vj) as the jth element in a 2 dimensional RQMC sequence. 
Select an ancestor 6* from the 0*“^ using Vj as for SQMC 
Generate 0 ^^ using 0 * and Uj as for SQMC 
Increment j. 
until p(z?(z),z7(y)) < et 

Set Jf> „ ,((»;•')/ {e.-T (»'•’ - »'•-") } 

end for 

Take S* as twice the weighted variance of the 

end for 
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Fig. 1. Computational time (in minutes) of the standard population ABC of |l\/larin et al.H2012| (triangles) and QMC popula¬ 
tion ABC (circles) samplers, for increasing number of particles. The former increases linearly, but the latter does not. 
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